Skip to content

[ES|QL] Lookup join and Inline stats support for query approximation#145980

Open
jan-elastic wants to merge 10 commits intoelastic:mainfrom
jan-elastic:esql-approximate-with-join
Open

[ES|QL] Lookup join and Inline stats support for query approximation#145980
jan-elastic wants to merge 10 commits intoelastic:mainfrom
jan-elastic:esql-approximate-with-join

Conversation

@jan-elastic
Copy link
Copy Markdown
Contributor

@jan-elastic jan-elastic commented Apr 9, 2026

No description provided.

@jan-elastic jan-elastic added >feature :ml Machine learning Team:ML Meta label for the ML team :ml/ES|QL ML Commands in ES|QL e.g. CATEGORIZE, SAMPLE, CHANGE_POINT etc... v9.5.0 labels Apr 9, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic force-pushed the esql-approximate-with-join branch from 6726448 to 3bfcab1 Compare April 9, 2026 12:23
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic force-pushed the esql-approximate-with-join branch 2 times, most recently from 9f4f8eb to f743ed6 Compare April 10, 2026 07:40
Copy link
Copy Markdown
Member

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jan-elastic, for what I can see it looks good. The test coverage is good and the logic looks consistent.

For me it's a 👍 , but I'd also ask @astefan and @julian-elastic for a feedback

@jan-elastic jan-elastic force-pushed the esql-approximate-with-join branch 2 times, most recently from 470cecb to d051b06 Compare April 10, 2026 09:37
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic removed the :ml/ES|QL ML Commands in ES|QL e.g. CATEGORIZE, SAMPLE, CHANGE_POINT etc... label Apr 10, 2026
@jan-elastic jan-elastic force-pushed the esql-approximate-with-join branch from d9b1230 to 214ca31 Compare April 10, 2026 11:32
@julian-elastic
Copy link
Copy Markdown
Contributor

Your change can change the plan, right? I don't see a single test where you actually verify that the plan is correct after your change. Please look at how we add golden tests and add a few, so we can make sure the plan is correct.

verify("ROW i=[1,2,3] | EVAL x=TO_STRING(i) | DISSECT x \"%{x}\" | STATS i=10*POW(PERCENTILE(i, 0.5), 2) | LIMIT 10");
verify("FROM test | URI_PARTS parts = last_name | STATS scheme_count = COUNT() BY parts.scheme | LIMIT 10");
verify("FROM test | REGISTERED_DOMAIN rd = last_name | STATS c = COUNT() BY rd.registered_domain | LIMIT 10");
verify("FROM test | INLINE STATS COUNT() BY last_name | LIMIT 10");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some tests with views? I know it is not related to this PR, but you dont have any. Views might introduce fork, which is n-nary plan, and it might have issue with finding the first leaf.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out.

I'm currently working on supporting FORK queries. I will have a look at views then too. Is that okay?


PhysicalPlan child = plan.child().transformUp(LeafExec.class, leaf -> new SampleExec(Source.EMPTY, leaf, plan.sampleProbability()));
// The only non-unary plans that are currently supported are Joins.
// At the moment, the left side of the join is the "expensive" side and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you guard against someone adding a new binary node in the future and your feature not working correctly? Should we be checking for supported n-nary nodes instead of blindly allowing all n-nary nodes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New nodes are not supported for approximation by default, see Approximation.SUPPORTED_COMMANDS.

When someone adds a new binary node to this allowlist, I expect them to make sure that node works end-to-end. Anyway, I'm working on FORK now, which probably requires touching this (join -> only sample first child, fork -> sample all children). So this code won't be there for long.

Copy link
Copy Markdown
Contributor

@julian-elastic julian-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great! I left some comments, feel free to check in after you address them.

@jan-elastic
Copy link
Copy Markdown
Contributor Author

Your change can change the plan, right? I don't see a single test where you actually verify that the plan is correct after your change. Please look at how we add golden tests and add a few, so we can make sure the plan is correct.

I've added a test to ReplaceSampledStatsBySampleAndStatsTests.

Regarding the golden tests: query approximation doesn't have any yet. I'll give a look at them and add them in a separate PR.

@jan-elastic jan-elastic force-pushed the esql-approximate-with-join branch from 214ca31 to 575db2c Compare April 14, 2026 08:23
Row.class,
Sample.class,
SampledAggregate.class,
StubRelation.class,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a comment mentioning this class specifically. This one is a placeholder node and it should pretty much disappear from the logical plan tree after the optimizations.

Copy link
Copy Markdown
Contributor Author

@jan-elastic jan-elastic Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This StubRelation survives the optimizations and reaches EsqlSession::executeSubPlans. There subplan execution substitutes them.

I will add a comment though: "Temporary node generated by INLINE STATS"

InlineJoin.class,
Insist.class,
LocalRelation.class,
Join.class,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit surprised to see Join here and not LookupJoin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too.

It's caused by this surrogate:
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/join/LookupJoin.java#L67-L71

Apparently it is "to deal with serialization & co".

Maybe it's better to remove this surrogate, add serialization to LookupJoin, and make Join abstract? 🤷 I can't oversee the consequences. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PR solves it: jan-elastic#3

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Join is kind of synonymous with LookupJoin: the surrogate replaces the LookupJoin by Join, and LookupJoin itself is not even in PlanWritables.

assert sampleProbability < 1.0;

PhysicalPlan child = plan.child().transformUp(LeafExec.class, leaf -> new SampleExec(Source.EMPTY, leaf, plan.sampleProbability()));
// The only non-unary plans that are currently supported are Joins.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe in the same idea with Julian's comment. These changes here assume joins (in general), while we only support lookup joins now, let's not assume that "join" = "lookup join" for now. I commented similarly above in the list of supported_commands, let's make sure what we support now is as restrictive and realistic as possible.

If we add support for other joins in the future and users start using it and then we realize the support for those future joins needs further design/consideration/testing, adding restrictions means breaking changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should be as restrictive as possible.

I think the root problem is the issue above (LookupJoin's surrogate making it a general Join).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@jan-elastic jan-elastic Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's a quick attempt at fixing it (missing some bwc stuff):
#146213 (it's this PR + 3 more commits)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without deep analysis, making bigger changes to Join.java deserves its own PR. I wouldn't do it here.

I think all you need is "is this join a lookup join?", right? Yeah, we don't make this easy and natural to check. But! Practically, you can indeed check this by walking down the right hand side of the join, picking up the EsRelation, and checking that it has IndexMode.LOOKUP.

That's something we can easily add as a method on Join.java, and can be added to this PR. (We do this check all the time; it should've become a helper method a long time ago, we sillies.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize we're inside the local physical optimizer, d'oh.

In terms of physical plans, we luckily have LookupJoinExec and don't have a general JoinExec. (There's HashJoinExec for INLINE STATS). So I'm not sure what exactly the problem is right here. We could throw or assert on a list of allowed physical plans (actually that'd be a bit safer, as physical optimization could shove some weird stuff before the STATS); but the question seems to be around the list of allowed logical plans, right? Yeah, for those, my comment around identifying lookup joins stands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've added an assertion now. It's all good. Thanks

@jan-elastic jan-elastic force-pushed the esql-approximate-with-join branch from dd6607b to 1705aac Compare April 14, 2026 10:26
@jan-elastic jan-elastic requested a review from astefan April 14, 2026 10:55
Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some interesting edge cases that must be explored imho.
The one that I did test came from this PR where pruning inline stats in some scenarios leads to code paths that are rarely explored.

In PruneColumns.pruneColumnsInAggregate method I think there is a "silent" inlineJoin=true in case that aggregates is a SampledAggregate and I don't think that's the intention.

Running

SET approximation={"rows":10000};
FROM employees
| INLINE STATS x = MAX(salary) WHERE false, c = COUNT(*) BY emp_no
| KEEP x, c
| SORT x, c
| LIMIT 3

stops the node for me with this exception

[ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [runTask-0] fatal error in thread [elasticsearch[runTask-0][search_coordination][T#1]], exiting java.lang.ExceptionInInitializerError
        at org.elasticsearch.xpack.esql.optimizer.rules.logical.SubstituteApproximationPlan.apply(SubstituteApproximationPlan.java:27)
        at org.elasticsearch.xpack.esql.optimizer.rules.logical.SubstituteApproximationPlan.apply(SubstituteApproximationPlan.java:20)
        at org.elasticsearch.xpack.esql.rule.ParameterizedRuleExecutor.lambda$transform$0(ParameterizedRuleExecutor.java:29)
        at org.elasticsearch.xpack.esql.rule.RuleExecutor.execute(RuleExecutor.java:128)
        at org.elasticsearch.xpack.esql.optimizer.LogicalPlanOptimizer.optimize(LogicalPlanOptimizer.java:137)
        at org.elasticsearch.xpack.esql.session.EsqlSession.optimizedPlan(EsqlSession.java:1598)

I did also try a query from this other PR - #135011 - and got the same stacktrace as above. Maybe I missed something in my tests...

@jan-elastic
Copy link
Copy Markdown
Contributor Author

jan-elastic commented Apr 14, 2026

I can't reproduce your failure

When running that query, I get the expected

#! line 4:20: approximation not supported: aggregation function [MAX(salary)] cannot be approximated
       x       |       c       
---------------+---------------
null           |1              
null           |1              
null           |1     

(tried both a locally running cluster, and CsvIT)

@jan-elastic jan-elastic force-pushed the esql-approximate-with-join branch from 1705aac to 4883422 Compare April 14, 2026 15:22
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

🔍 Preview links for changed docs

⏳ Building and deploying preview... View progress

This comment will be updated with preview links when the build is complete.

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :ml Machine learning Team:ML Meta label for the ML team v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants